Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 3276 |
| Missing cells | 1434 |
| Missing cells (%) | 4.4% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 256.1 KiB |
| Average record size in memory | 80.0 B |
Variable types
| NUM | 9 |
|---|---|
| BOOL | 1 |
ph has 491 (15.0%) missing values | Missing |
Sulfate has 781 (23.8%) missing values | Missing |
Trihalomethanes has 162 (4.9%) missing values | Missing |
Hardness has unique values | Unique |
Solids has unique values | Unique |
Chloramines has unique values | Unique |
Conductivity has unique values | Unique |
Organic_carbon has unique values | Unique |
Turbidity has unique values | Unique |
Reproduction
| Analysis started | 2022-04-12 19:18:29.424274 |
|---|---|
| Analysis finished | 2022-04-12 19:18:43.829119 |
| Duration | 14.4 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 2785 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 491 |
| Missing (%) | 15.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.080794504 |
|---|---|
| Minimum | 0 |
| Maximum | 14 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4.487970742 |
| Q1 | 6.093091914 |
| median | 7.036752104 |
| Q3 | 8.062066123 |
| 95-th percentile | 9.789818577 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 1.968974209 |
Descriptive statistics
| Standard deviation | 1.594319519 |
|---|---|
| Coefficient of variation (CV) | 0.2251611055 |
| Kurtosis | 0.7203155798 |
| Mean | 7.080794504 |
| Median Absolute Deviation (MAD) | 0.984116999 |
| Skewness | 0.02563044763 |
| Sum | 19720.01269 |
| Variance | 2.541854728 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 8.55409697 | 1 | < 0.1% | |
| 6.538084087 | 1 | < 0.1% | |
| 5.91580675 | 1 | < 0.1% | |
| 8.136497869 | 1 | < 0.1% | |
| 6.493764175 | 1 | < 0.1% | |
| 6.977405633 | 1 | < 0.1% | |
| 5.489248055 | 1 | < 0.1% | |
| 2.558102799 | 1 | < 0.1% | |
| 7.312109304 | 1 | < 0.1% | |
| 6.704431913 | 1 | < 0.1% | |
| 6.44897109 | 1 | < 0.1% | |
| 8.028304242 | 1 | < 0.1% | |
| 8.616824426 | 1 | < 0.1% | |
| 9.46712903 | 1 | < 0.1% | |
| 6.793698635 | 1 | < 0.1% | |
| 5.486058602 | 1 | < 0.1% | |
| 5.05074838 | 1 | < 0.1% | |
| 6.246117565 | 1 | < 0.1% | |
| 8.142330913 | 1 | < 0.1% | |
| 5.189413669 | 1 | < 0.1% | |
| 7.879543234 | 1 | < 0.1% | |
| 8.801933555 | 1 | < 0.1% | |
| 7.038092306 | 1 | < 0.1% | |
| 5.423318496 | 1 | < 0.1% | |
| 4.838571107 | 1 | < 0.1% | |
| Other values (2760) | 2760 | 84.2% | |
| (Missing) | 491 | 15.0% |
| Value | Count | Frequency (%) | |
| 0 | 1 | < 0.1% | |
| 0.2274990502 | 1 | < 0.1% | |
| 0.9755779898 | 1 | < 0.1% | |
| 0.9899122129 | 1 | < 0.1% | |
| 1.431781555 | 1 | < 0.1% | |
| 1.757037115 | 1 | < 0.1% | |
| 1.844538366 | 1 | < 0.1% | |
| 1.985383359 | 1 | < 0.1% | |
| 2.128531434 | 1 | < 0.1% | |
| 2.376768076 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 14 | 1 | < 0.1% | |
| 13.54124024 | 1 | < 0.1% | |
| 13.34988856 | 1 | < 0.1% | |
| 13.17540172 | 1 | < 0.1% | |
| 12.24692807 | 1 | < 0.1% | |
| 11.90773983 | 1 | < 0.1% | |
| 11.89807803 | 1 | < 0.1% | |
| 11.62114013 | 1 | < 0.1% | |
| 11.56876797 | 1 | < 0.1% | |
| 11.56316906 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 196.369496 |
|---|---|
| Minimum | 47.432 |
| Maximum | 323.124 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 47.432 |
|---|---|
| 5-th percentile | 141.7632807 |
| Q1 | 176.8505379 |
| median | 196.9676269 |
| Q3 | 216.6674562 |
| 95-th percentile | 249.6097689 |
| Maximum | 323.124 |
| Range | 275.692 |
| Interquartile range (IQR) | 39.81691834 |
Descriptive statistics
| Standard deviation | 32.87976148 |
|---|---|
| Coefficient of variation (CV) | 0.1674382332 |
| Kurtosis | 0.6157716821 |
| Mean | 196.369496 |
| Median Absolute Deviation (MAD) | 19.84498917 |
| Skewness | -0.03934170478 |
| Sum | 643306.469 |
| Variance | 1081.078715 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 204.8904555 | 1 | < 0.1% | |
| 134.5602761 | 1 | < 0.1% | |
| 170.1909123 | 1 | < 0.1% | |
| 237.4610992 | 1 | < 0.1% | |
| 171.2389255 | 1 | < 0.1% | |
| 197.4281988 | 1 | < 0.1% | |
| 195.7440741 | 1 | < 0.1% | |
| 184.2318535 | 1 | < 0.1% | |
| 187.8732835 | 1 | < 0.1% | |
| 205.1505644 | 1 | < 0.1% | |
| 205.3385456 | 1 | < 0.1% | |
| 205.5634995 | 1 | < 0.1% | |
| 147.490575 | 1 | < 0.1% | |
| 260.8450393 | 1 | < 0.1% | |
| 199.8129989 | 1 | < 0.1% | |
| 191.7084136 | 1 | < 0.1% | |
| 148.8421293 | 1 | < 0.1% | |
| 194.7191859 | 1 | < 0.1% | |
| 204.7837347 | 1 | < 0.1% | |
| 168.0424651 | 1 | < 0.1% | |
| 228.7629452 | 1 | < 0.1% | |
| 169.2144075 | 1 | < 0.1% | |
| 227.2257507 | 1 | < 0.1% | |
| 229.7253484 | 1 | < 0.1% | |
| 169.333843 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 47.432 | 1 | < 0.1% | |
| 73.49223369 | 1 | < 0.1% | |
| 77.4595861 | 1 | < 0.1% | |
| 81.71089527 | 1 | < 0.1% | |
| 94.09130748 | 1 | < 0.1% | |
| 94.81254522 | 1 | < 0.1% | |
| 94.90897713 | 1 | < 0.1% | |
| 97.2809086 | 1 | < 0.1% | |
| 98.3679149 | 1 | < 0.1% | |
| 98.45293051 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 323.124 | 1 | < 0.1% | |
| 317.3381241 | 1 | < 0.1% | |
| 311.3839565 | 1 | < 0.1% | |
| 308.2538329 | 1 | < 0.1% | |
| 307.7060241 | 1 | < 0.1% | |
| 306.6274814 | 1 | < 0.1% | |
| 304.2359121 | 1 | < 0.1% | |
| 303.7026267 | 1 | < 0.1% | |
| 300.2924758 | 1 | < 0.1% | |
| 298.0986795 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 22014.09253 |
|---|---|
| Minimum | 320.9426113 |
| Maximum | 61227.19601 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 320.9426113 |
|---|---|
| 5-th percentile | 9545.812579 |
| Q1 | 15666.6903 |
| median | 20927.83361 |
| Q3 | 27332.76213 |
| 95-th percentile | 38474.99025 |
| Maximum | 61227.19601 |
| Range | 60906.2534 |
| Interquartile range (IQR) | 11666.07183 |
Descriptive statistics
| Standard deviation | 8768.570828 |
|---|---|
| Coefficient of variation (CV) | 0.3983162521 |
| Kurtosis | 0.4428260858 |
| Mean | 22014.09253 |
| Median Absolute Deviation (MAD) | 5809.471858 |
| Skewness | 0.6216344855 |
| Sum | 72118167.12 |
| Variance | 76887834.36 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 20791.31898 | 1 | < 0.1% | |
| 15979.33479 | 1 | < 0.1% | |
| 37000.95567 | 1 | < 0.1% | |
| 18736.1909 | 1 | < 0.1% | |
| 12289.90092 | 1 | < 0.1% | |
| 15979.06027 | 1 | < 0.1% | |
| 12431.80311 | 1 | < 0.1% | |
| 30031.83918 | 1 | < 0.1% | |
| 29532.615 | 1 | < 0.1% | |
| 19821.33837 | 1 | < 0.1% | |
| 25142.73374 | 1 | < 0.1% | |
| 16100.96795 | 1 | < 0.1% | |
| 21316.50673 | 1 | < 0.1% | |
| 11803.7355 | 1 | < 0.1% | |
| 14540.73508 | 1 | < 0.1% | |
| 32112.56987 | 1 | < 0.1% | |
| 13329.03225 | 1 | < 0.1% | |
| 18344.06944 | 1 | < 0.1% | |
| 20408.4856 | 1 | < 0.1% | |
| 18564.37206 | 1 | < 0.1% | |
| 19126.29854 | 1 | < 0.1% | |
| 33365.31542 | 1 | < 0.1% | |
| 14470.05355 | 1 | < 0.1% | |
| 22444.55941 | 1 | < 0.1% | |
| 19168.52677 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 320.9426113 | 1 | < 0.1% | |
| 728.7508296 | 1 | < 0.1% | |
| 1198.943699 | 1 | < 0.1% | |
| 1351.906979 | 1 | < 0.1% | |
| 1372.091043 | 1 | < 0.1% | |
| 2552.962804 | 1 | < 0.1% | |
| 2808.025756 | 1 | < 0.1% | |
| 2835.303165 | 1 | < 0.1% | |
| 2912.211247 | 1 | < 0.1% | |
| 3413.081633 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 61227.19601 | 1 | < 0.1% | |
| 56867.85924 | 1 | < 0.1% | |
| 56488.67241 | 1 | < 0.1% | |
| 56351.3963 | 1 | < 0.1% | |
| 56320.58698 | 1 | < 0.1% | |
| 55334.7028 | 1 | < 0.1% | |
| 53735.89919 | 1 | < 0.1% | |
| 52318.9173 | 1 | < 0.1% | |
| 52060.2268 | 1 | < 0.1% | |
| 51731.82055 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.122276793 |
|---|---|
| Minimum | 0.352 |
| Maximum | 13.127 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 0.352 |
|---|---|
| 5-th percentile | 4.50305371 |
| Q1 | 6.127420755 |
| median | 7.130298974 |
| Q3 | 8.114887032 |
| 95-th percentile | 9.753100546 |
| Maximum | 13.127 |
| Range | 12.775 |
| Interquartile range (IQR) | 1.987466277 |
Descriptive statistics
| Standard deviation | 1.583084889 |
|---|---|
| Coefficient of variation (CV) | 0.2222723063 |
| Kurtosis | 0.5899011686 |
| Mean | 7.122276793 |
| Median Absolute Deviation (MAD) | 0.9916613427 |
| Skewness | -0.01209844012 |
| Sum | 23332.57878 |
| Variance | 2.506157766 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 7.300211873 | 1 | < 0.1% | |
| 9.504361027 | 1 | < 0.1% | |
| 6.217222542 | 1 | < 0.1% | |
| 5.599870342 | 1 | < 0.1% | |
| 10.78649982 | 1 | < 0.1% | |
| 7.424944591 | 1 | < 0.1% | |
| 6.6616162 | 1 | < 0.1% | |
| 6.21530731 | 1 | < 0.1% | |
| 7.981036899 | 1 | < 0.1% | |
| 6.344963412 | 1 | < 0.1% | |
| 5.639501073 | 1 | < 0.1% | |
| 5.527299246 | 1 | < 0.1% | |
| 9.142233666 | 1 | < 0.1% | |
| 5.260670005 | 1 | < 0.1% | |
| 8.827413554 | 1 | < 0.1% | |
| 8.115355082 | 1 | < 0.1% | |
| 7.118465403 | 1 | < 0.1% | |
| 7.611836667 | 1 | < 0.1% | |
| 4.531581224 | 1 | < 0.1% | |
| 8.562156382 | 1 | < 0.1% | |
| 7.017578359 | 1 | < 0.1% | |
| 8.460489787 | 1 | < 0.1% | |
| 8.471508955 | 1 | < 0.1% | |
| 5.702174923 | 1 | < 0.1% | |
| 8.081496162 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 0.352 | 1 | < 0.1% | |
| 0.5303512947 | 1 | < 0.1% | |
| 1.390870905 | 1 | < 0.1% | |
| 1.683992581 | 1 | < 0.1% | |
| 1.920271449 | 1 | < 0.1% | |
| 2.102690991 | 1 | < 0.1% | |
| 2.386653494 | 1 | < 0.1% | |
| 2.39798499 | 1 | < 0.1% | |
| 2.456013596 | 1 | < 0.1% | |
| 2.458609195 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 13.127 | 1 | < 0.1% | |
| 13.04380611 | 1 | < 0.1% | |
| 12.91218664 | 1 | < 0.1% | |
| 12.65336202 | 1 | < 0.1% | |
| 12.62689974 | 1 | < 0.1% | |
| 12.58002649 | 1 | < 0.1% | |
| 12.36328483 | 1 | < 0.1% | |
| 12.27937418 | 1 | < 0.1% | |
| 12.2463941 | 1 | < 0.1% | |
| 12.22717528 | 1 | < 0.1% |
| Distinct | 2495 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 781 |
| Missing (%) | 23.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 333.7757766 |
|---|---|
| Minimum | 129 |
| Maximum | 481.0306423 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 129 |
|---|---|
| 5-th percentile | 266.6162317 |
| Q1 | 307.6994978 |
| median | 333.0735457 |
| Q3 | 359.9501704 |
| 95-th percentile | 403.0701898 |
| Maximum | 481.0306423 |
| Range | 352.0306423 |
| Interquartile range (IQR) | 52.25067255 |
Descriptive statistics
| Standard deviation | 41.41684046 |
|---|---|
| Coefficient of variation (CV) | 0.1240858186 |
| Kurtosis | 0.648262815 |
| Mean | 333.7757766 |
| Median Absolute Deviation (MAD) | 26.0951759 |
| Skewness | -0.03594662161 |
| Sum | 832770.5626 |
| Variance | 1715.354674 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 280.7456229 | 1 | < 0.1% | |
| 332.7445192 | 1 | < 0.1% | |
| 391.9182286 | 1 | < 0.1% | |
| 330.9053704 | 1 | < 0.1% | |
| 402.3134271 | 1 | < 0.1% | |
| 360.6978151 | 1 | < 0.1% | |
| 336.0404518 | 1 | < 0.1% | |
| 405.5273372 | 1 | < 0.1% | |
| 346.0636768 | 1 | < 0.1% | |
| 368.5164413 | 1 | < 0.1% | |
| 273.7192819 | 1 | < 0.1% | |
| 380.7253358 | 1 | < 0.1% | |
| 373.6527906 | 1 | < 0.1% | |
| 274.4933955 | 1 | < 0.1% | |
| 298.3799764 | 1 | < 0.1% | |
| 296.0912732 | 1 | < 0.1% | |
| 371.3618512 | 1 | < 0.1% | |
| 366.2144279 | 1 | < 0.1% | |
| 301.2308482 | 1 | < 0.1% | |
| 365.8953377 | 1 | < 0.1% | |
| 342.6040828 | 1 | < 0.1% | |
| 297.239219 | 1 | < 0.1% | |
| 274.6587881 | 1 | < 0.1% | |
| 299.9588455 | 1 | < 0.1% | |
| 325.3239549 | 1 | < 0.1% | |
| Other values (2470) | 2470 | 75.4% | |
| (Missing) | 781 | 23.8% |
| Value | Count | Frequency (%) | |
| 129 | 1 | < 0.1% | |
| 180.2067464 | 1 | < 0.1% | |
| 182.3973702 | 1 | < 0.1% | |
| 187.1707144 | 1 | < 0.1% | |
| 187.4241309 | 1 | < 0.1% | |
| 192.0335917 | 1 | < 0.1% | |
| 203.4445208 | 1 | < 0.1% | |
| 205.9350906 | 1 | < 0.1% | |
| 206.2472294 | 1 | < 0.1% | |
| 207.8904823 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 481.0306423 | 1 | < 0.1% | |
| 476.5397173 | 1 | < 0.1% | |
| 475.7374602 | 1 | < 0.1% | |
| 462.474215 | 1 | < 0.1% | |
| 460.107069 | 1 | < 0.1% | |
| 458.4410723 | 1 | < 0.1% | |
| 455.4512337 | 1 | < 0.1% | |
| 450.9144544 | 1 | < 0.1% | |
| 449.2676875 | 1 | < 0.1% | |
| 447.4179624 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 426.2051107 |
|---|---|
| Minimum | 181.483754 |
| Maximum | 753.3426196 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 181.483754 |
|---|---|
| 5-th percentile | 300.1094657 |
| Q1 | 365.7344141 |
| median | 421.8849683 |
| Q3 | 481.7923045 |
| 95-th percentile | 566.3493198 |
| Maximum | 753.3426196 |
| Range | 571.8588656 |
| Interquartile range (IQR) | 116.0578904 |
Descriptive statistics
| Standard deviation | 80.82406405 |
|---|---|
| Coefficient of variation (CV) | 0.1896365436 |
| Kurtosis | -0.2770928329 |
| Mean | 426.2051107 |
| Median Absolute Deviation (MAD) | 57.88759119 |
| Skewness | 0.2644902239 |
| Sum | 1396247.943 |
| Variance | 6532.52933 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 564.3086542 | 1 | < 0.1% | |
| 418.6420628 | 1 | < 0.1% | |
| 517.5767619 | 1 | < 0.1% | |
| 235.0422835 | 1 | < 0.1% | |
| 501.5597252 | 1 | < 0.1% | |
| 452.1872326 | 1 | < 0.1% | |
| 367.8540248 | 1 | < 0.1% | |
| 400.6118991 | 1 | < 0.1% | |
| 469.1321169 | 1 | < 0.1% | |
| 482.5957093 | 1 | < 0.1% | |
| 528.2800236 | 1 | < 0.1% | |
| 532.342083 | 1 | < 0.1% | |
| 406.3720189 | 1 | < 0.1% | |
| 424.0584184 | 1 | < 0.1% | |
| 486.9837343 | 1 | < 0.1% | |
| 518.9443622 | 1 | < 0.1% | |
| 403.4209957 | 1 | < 0.1% | |
| 320.3560139 | 1 | < 0.1% | |
| 515.5750971 | 1 | < 0.1% | |
| 419.1315765 | 1 | < 0.1% | |
| 383.5270231 | 1 | < 0.1% | |
| 449.7239517 | 1 | < 0.1% | |
| 360.7556743 | 1 | < 0.1% | |
| 389.0308888 | 1 | < 0.1% | |
| 350.5773702 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 181.483754 | 1 | < 0.1% | |
| 201.6197368 | 1 | < 0.1% | |
| 210.319182 | 1 | < 0.1% | |
| 217.3583296 | 1 | < 0.1% | |
| 232.613624 | 1 | < 0.1% | |
| 233.9079651 | 1 | < 0.1% | |
| 235.0422835 | 1 | < 0.1% | |
| 245.859632 | 1 | < 0.1% | |
| 247.9180305 | 1 | < 0.1% | |
| 251.0208987 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 753.3426196 | 1 | < 0.1% | |
| 708.2263645 | 1 | < 0.1% | |
| 695.369528 | 1 | < 0.1% | |
| 674.4434759 | 1 | < 0.1% | |
| 672.5569992 | 1 | < 0.1% | |
| 669.7250862 | 1 | < 0.1% | |
| 666.6906183 | 1 | < 0.1% | |
| 660.2549463 | 1 | < 0.1% | |
| 657.5704218 | 1 | < 0.1% | |
| 656.9241278 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 14.28497025 |
|---|---|
| Minimum | 2.2 |
| Maximum | 28.3 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 2.2 |
|---|---|
| 5-th percentile | 8.815361702 |
| Q1 | 12.06580133 |
| median | 14.21833794 |
| Q3 | 16.55765154 |
| 95-th percentile | 19.63725445 |
| Maximum | 28.3 |
| Range | 26.1 |
| Interquartile range (IQR) | 4.49185021 |
Descriptive statistics
| Standard deviation | 3.308161999 |
|---|---|
| Coefficient of variation (CV) | 0.2315834014 |
| Kurtosis | 0.04440930715 |
| Mean | 14.28497025 |
| Median Absolute Deviation (MAD) | 2.232294118 |
| Skewness | 0.02553258209 |
| Sum | 46797.56253 |
| Variance | 10.94393581 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 10.37978308 | 1 | < 0.1% | |
| 12.89763545 | 1 | < 0.1% | |
| 15.87176979 | 1 | < 0.1% | |
| 11.545477 | 1 | < 0.1% | |
| 12.28433352 | 1 | < 0.1% | |
| 18.58495937 | 1 | < 0.1% | |
| 21.30064694 | 1 | < 0.1% | |
| 15.28878163 | 1 | < 0.1% | |
| 16.1692117 | 1 | < 0.1% | |
| 12.16473568 | 1 | < 0.1% | |
| 12.91150923 | 1 | < 0.1% | |
| 10.3465741 | 1 | < 0.1% | |
| 11.51966908 | 1 | < 0.1% | |
| 12.31894242 | 1 | < 0.1% | |
| 17.27165578 | 1 | < 0.1% | |
| 8.567579587 | 1 | < 0.1% | |
| 20.79437133 | 1 | < 0.1% | |
| 12.3332371 | 1 | < 0.1% | |
| 21.55886261 | 1 | < 0.1% | |
| 18.4598902 | 1 | < 0.1% | |
| 14.75925658 | 1 | < 0.1% | |
| 10.39679564 | 1 | < 0.1% | |
| 19.4021494 | 1 | < 0.1% | |
| 15.501543 | 1 | < 0.1% | |
| 15.17753397 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 2.2 | 1 | < 0.1% | |
| 4.371898608 | 1 | < 0.1% | |
| 4.466771969 | 1 | < 0.1% | |
| 4.473092264 | 1 | < 0.1% | |
| 4.861631498 | 1 | < 0.1% | |
| 4.902888068 | 1 | < 0.1% | |
| 4.966861619 | 1 | < 0.1% | |
| 5.051694615 | 1 | < 0.1% | |
| 5.159380308 | 1 | < 0.1% | |
| 5.188466455 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 28.3 | 1 | < 0.1% | |
| 27.00670661 | 1 | < 0.1% | |
| 24.75539237 | 1 | < 0.1% | |
| 23.95245044 | 1 | < 0.1% | |
| 23.91760126 | 1 | < 0.1% | |
| 23.66766678 | 1 | < 0.1% | |
| 23.60429797 | 1 | < 0.1% | |
| 23.56964491 | 1 | < 0.1% | |
| 23.51477377 | 1 | < 0.1% | |
| 23.39951606 | 1 | < 0.1% |
| Distinct | 3114 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 162 |
| Missing (%) | 4.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 66.39629295 |
|---|---|
| Minimum | 0.738 |
| Maximum | 124 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 0.738 |
|---|---|
| 5-th percentile | 39.55292835 |
| Q1 | 55.84453562 |
| median | 66.6224851 |
| Q3 | 77.33747291 |
| 95-th percentile | 92.12405947 |
| Maximum | 124 |
| Range | 123.262 |
| Interquartile range (IQR) | 21.49293729 |
Descriptive statistics
| Standard deviation | 16.17500842 |
|---|---|
| Coefficient of variation (CV) | 0.2436131251 |
| Kurtosis | 0.2385974402 |
| Mean | 66.39629295 |
| Median Absolute Deviation (MAD) | 10.74217213 |
| Skewness | -0.08303067408 |
| Sum | 206758.0562 |
| Variance | 261.6308975 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 86.99097046 | 1 | < 0.1% | |
| 56.71550955 | 1 | < 0.1% | |
| 77.73081437 | 1 | < 0.1% | |
| 90.39489472 | 1 | < 0.1% | |
| 37.78709664 | 1 | < 0.1% | |
| 78.9255271 | 1 | < 0.1% | |
| 89.47771837 | 1 | < 0.1% | |
| 69.526718 | 1 | < 0.1% | |
| 72.57395938 | 1 | < 0.1% | |
| 57.78086932 | 1 | < 0.1% | |
| 83.27758326 | 1 | < 0.1% | |
| 70.13560786 | 1 | < 0.1% | |
| 58.06246255 | 1 | < 0.1% | |
| 70.34610229 | 1 | < 0.1% | |
| 75.11488818 | 1 | < 0.1% | |
| 70.54721764 | 1 | < 0.1% | |
| 59.70020966 | 1 | < 0.1% | |
| 41.27592659 | 1 | < 0.1% | |
| 74.32689798 | 1 | < 0.1% | |
| 40.29059137 | 1 | < 0.1% | |
| 34.24621224 | 1 | < 0.1% | |
| 59.70801705 | 1 | < 0.1% | |
| 54.47393462 | 1 | < 0.1% | |
| 68.91118479 | 1 | < 0.1% | |
| 68.79917852 | 1 | < 0.1% | |
| Other values (3089) | 3089 | 94.3% | |
| (Missing) | 162 | 4.9% |
| Value | Count | Frequency (%) | |
| 0.738 | 1 | < 0.1% | |
| 8.175876384 | 1 | < 0.1% | |
| 8.577012933 | 1 | < 0.1% | |
| 14.34316145 | 1 | < 0.1% | |
| 15.6848768 | 1 | < 0.1% | |
| 16.2915046 | 1 | < 0.1% | |
| 17.00068293 | 1 | < 0.1% | |
| 17.52776496 | 1 | < 0.1% | |
| 17.91572257 | 1 | < 0.1% | |
| 18.01527236 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 124 | 1 | < 0.1% | |
| 120.030077 | 1 | < 0.1% | |
| 118.3572747 | 1 | < 0.1% | |
| 116.1616216 | 1 | < 0.1% | |
| 114.2086714 | 1 | < 0.1% | |
| 114.0349457 | 1 | < 0.1% | |
| 113.0488857 | 1 | < 0.1% | |
| 112.622733 | 1 | < 0.1% | |
| 112.4122104 | 1 | < 0.1% | |
| 112.0610274 | 1 | < 0.1% |
| Distinct | 3276 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.96678617 |
|---|---|
| Minimum | 1.45 |
| Maximum | 6.739 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 25.7 KiB |
Quantile statistics
| Minimum | 1.45 |
|---|---|
| 5-th percentile | 2.684279234 |
| Q1 | 3.43971087 |
| median | 3.955027563 |
| Q3 | 4.500319787 |
| 95-th percentile | 5.220924525 |
| Maximum | 6.739 |
| Range | 5.289 |
| Interquartile range (IQR) | 1.060608918 |
Descriptive statistics
| Standard deviation | 0.7803824085 |
|---|---|
| Coefficient of variation (CV) | 0.1967291341 |
| Kurtosis | -0.06280064052 |
| Mean | 3.96678617 |
| Median Absolute Deviation (MAD) | 0.5302962358 |
| Skewness | -0.007816642377 |
| Sum | 12995.19149 |
| Variance | 0.6089967035 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 2.963135381 | 1 | < 0.1% | |
| 3.987012091 | 1 | < 0.1% | |
| 4.066229364 | 1 | < 0.1% | |
| 3.759326201 | 1 | < 0.1% | |
| 4.876273 | 1 | < 0.1% | |
| 5.143750122 | 1 | < 0.1% | |
| 4.513200539 | 1 | < 0.1% | |
| 4.20418585 | 1 | < 0.1% | |
| 4.586748359 | 1 | < 0.1% | |
| 4.910911021 | 1 | < 0.1% | |
| 4.894829651 | 1 | < 0.1% | |
| 5.327584532 | 1 | < 0.1% | |
| 2.817258117 | 1 | < 0.1% | |
| 2.816074739 | 1 | < 0.1% | |
| 5.123871935 | 1 | < 0.1% | |
| 3.56360986 | 1 | < 0.1% | |
| 3.395331743 | 1 | < 0.1% | |
| 4.515322334 | 1 | < 0.1% | |
| 3.91599081 | 1 | < 0.1% | |
| 2.962432971 | 1 | < 0.1% | |
| 4.584565663 | 1 | < 0.1% | |
| 3.164187994 | 1 | < 0.1% | |
| 4.822958046 | 1 | < 0.1% | |
| 3.85115424 | 1 | < 0.1% | |
| 3.735983476 | 1 | < 0.1% | |
| Other values (3251) | 3251 | 99.2% |
| Value | Count | Frequency (%) | |
| 1.45 | 1 | < 0.1% | |
| 1.492206615 | 1 | < 0.1% | |
| 1.496100943 | 1 | < 0.1% | |
| 1.64151501 | 1 | < 0.1% | |
| 1.659799385 | 1 | < 0.1% | |
| 1.680554025 | 1 | < 0.1% | |
| 1.687624505 | 1 | < 0.1% | |
| 1.801326999 | 1 | < 0.1% | |
| 1.81252894 | 1 | < 0.1% | |
| 1.844371604 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6.739 | 1 | < 0.1% | |
| 6.494748556 | 1 | < 0.1% | |
| 6.494249467 | 1 | < 0.1% | |
| 6.389161009 | 1 | < 0.1% | |
| 6.35743852 | 1 | < 0.1% | |
| 6.307678472 | 1 | < 0.1% | |
| 6.226580405 | 1 | < 0.1% | |
| 6.204846359 | 1 | < 0.1% | |
| 6.099631873 | 1 | < 0.1% | |
| 6.083772354 | 1 | < 0.1% |
Potability
Boolean
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 25.7 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 1998 | 61.0% | |
| 1 | 1278 | 39.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| ph | Hardness | Solids | Chloramines | Sulfate | Conductivity | Organic_carbon | Trihalomethanes | Turbidity | Potability | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | 204.890455 | 20791.318981 | 7.300212 | 368.516441 | 564.308654 | 10.379783 | 86.990970 | 2.963135 | 0 |
| 1 | 3.716080 | 129.422921 | 18630.057858 | 6.635246 | NaN | 592.885359 | 15.180013 | 56.329076 | 4.500656 | 0 |
| 2 | 8.099124 | 224.236259 | 19909.541732 | 9.275884 | NaN | 418.606213 | 16.868637 | 66.420093 | 3.055934 | 0 |
| 3 | 8.316766 | 214.373394 | 22018.417441 | 8.059332 | 356.886136 | 363.266516 | 18.436524 | 100.341674 | 4.628771 | 0 |
| 4 | 9.092223 | 181.101509 | 17978.986339 | 6.546600 | 310.135738 | 398.410813 | 11.558279 | 31.997993 | 4.075075 | 0 |
| 5 | 5.584087 | 188.313324 | 28748.687739 | 7.544869 | 326.678363 | 280.467916 | 8.399735 | 54.917862 | 2.559708 | 0 |
| 6 | 10.223862 | 248.071735 | 28749.716544 | 7.513408 | 393.663396 | 283.651634 | 13.789695 | 84.603556 | 2.672989 | 0 |
| 7 | 8.635849 | 203.361523 | 13672.091764 | 4.563009 | 303.309771 | 474.607645 | 12.363817 | 62.798309 | 4.401425 | 0 |
| 8 | NaN | 118.988579 | 14285.583854 | 7.804174 | 268.646941 | 389.375566 | 12.706049 | 53.928846 | 3.595017 | 0 |
| 9 | 11.180284 | 227.231469 | 25484.508491 | 9.077200 | 404.041635 | 563.885481 | 17.927806 | 71.976601 | 4.370562 | 0 |
Last rows
| ph | Hardness | Solids | Chloramines | Sulfate | Conductivity | Organic_carbon | Trihalomethanes | Turbidity | Potability | |
|---|---|---|---|---|---|---|---|---|---|---|
| 3266 | 8.372910 | 169.087052 | 14622.745494 | 7.547984 | NaN | 464.525552 | 11.083027 | 38.435151 | 4.906358 | 1 |
| 3267 | 8.989900 | 215.047358 | 15921.412018 | 6.297312 | 312.931022 | 390.410231 | 9.899115 | 55.069304 | 4.613843 | 1 |
| 3268 | 6.702547 | 207.321086 | 17246.920347 | 7.708117 | 304.510230 | 329.266002 | 16.217303 | 28.878601 | 3.442983 | 1 |
| 3269 | 11.491011 | 94.812545 | 37188.826022 | 9.263166 | 258.930600 | 439.893618 | 16.172755 | 41.558501 | 4.369264 | 1 |
| 3270 | 6.069616 | 186.659040 | 26138.780191 | 7.747547 | 345.700257 | 415.886955 | 12.067620 | 60.419921 | 3.669712 | 1 |
| 3271 | 4.668102 | 193.681735 | 47580.991603 | 7.166639 | 359.948574 | 526.424171 | 13.894419 | 66.687695 | 4.435821 | 1 |
| 3272 | 7.808856 | 193.553212 | 17329.802160 | 8.061362 | NaN | 392.449580 | 19.903225 | NaN | 2.798243 | 1 |
| 3273 | 9.419510 | 175.762646 | 33155.578218 | 7.350233 | NaN | 432.044783 | 11.039070 | 69.845400 | 3.298875 | 1 |
| 3274 | 5.126763 | 230.603758 | 11983.869376 | 6.303357 | NaN | 402.883113 | 11.168946 | 77.488213 | 4.708658 | 1 |
| 3275 | 7.874671 | 195.102299 | 17404.177061 | 7.509306 | NaN | 327.459760 | 16.140368 | 78.698446 | 2.309149 | 1 |